feature correspondence
Pro3D-Editor: AProgressive-Views Perspective for Consistent and Precise 3DEditing
T gions, ext-guided which 3D has editing significant aims potential to precisely for edit various semantically practical applications relevant local ranging 3D refrom 3D games to film production. Existing methods typically follow a viewindiscriminate paradigm: editing 2D views indiscriminately and projecting them back dencies, into resulting 3D space. in Ho inconsistent wever, the multi-vie y overlook w editing.
CVD-SfM: A Cross-View Deep Front-end Structure-from-Motion System for Sparse Localization in Multi-Altitude Scenes
Li, Yaxuan, Huang, Yewei, Gaudel, Bijay, Jafarnejadsani, Hamidreza, Englot, Brendan
-- We present a novel multi-altitude camera pose estimation system, addressing the challenges of robust and accurate localization across varied altitudes when only considering sparse image input. The system effectively handles diverse environmental conditions and viewpoint variations by integrating the cross-view transformer, deep features, and structure-from-motion into a unified framework. T o benchmark our method and foster further research, we introduce two newly collected datasets specifically tailored for multi-altitude camera pose estimation; datasets of this nature remain rare in the current literature. The proposed framework has been validated through extensive comparative analyses on these datasets, demonstrating that our system achieves superior performance in both accuracy and robustness for multi-altitude sparse pose estimation tasks compared to existing solutions, making it well suited for real-world robotic applications such as aerial navigation, search and rescue, and automated inspection. I. INTRODUCTION Structure-from-motion (SfM) [1], [2], [3] has been receiving extensive attention in the field of computer vision and robotics; it is pivotal in various real-world applications, including autonomous navigation [4], rapid mapping after natural disasters for situational awareness, detailed preservation of historical landmarks [5], and immersive virtual reality (VR) experiences. As research progresses, it provides a cornerstone in achieving pose estimation and reconstruction, standing out as a particularly effective technique, specifically when input is limited to sparse images. While conventional SfM approaches deliver impressive results under conditions of abundant image overlap, they often struggle with sparse input captured at vastly different altitudes. In these scenarios, the drastic viewpoint differences limit shared visual features, making it difficult to establish reliable correspondences.
Feature Geometry for Stereo Sidescan and Forward-looking Sonar
Norman, Kalin, Mangelson, Joshua G.
-- In this paper, we address stereo acoustic data fusion for marine robotics and propose a geometry-based method for projecting observed features from one sonar to another for a cross-modal stereo sonar setup that consists of both a forward-looking and a sidescan sonar . Our acoustic geometry for sidescan and forward-looking sonar is inspired by the epipolar geometry for stereo cameras, and we leverage relative pose information to project where an observed feature in one sonar image will be found in the image of another sonar . Additionally, we analyze how both the feature location relative to the sonar and the relative pose between the two sonars impact the projection. From simulated results, we identify desirable stereo configurations for applications in field robotics like feature correspondence and recovery of the 3D information of the feature. Field robotic applications, such as localization and mapping, in underwater environments face significant challenges due to the complex and dynamic nature of the marine domain.
Event-based Stereo Visual-Inertial Odometry with Voxel Map
Zhang, Zhaoxing, Wang, Xiaoxiang, Zhang, Chengliang, Guo, Yangyang, Yuan, Zikang, Yang, Xin
The event camera, renowned for its high dynamic range and exceptional temporal resolution, is recognized as an important sensor for visual odometry. However, the inherent noise in event streams complicates the selection of high-quality map points, which critically determine the precision of state estimation. To address this challenge, we propose Voxel-ESVIO, an event-based stereo visual-inertial odometry system that utilizes voxel map management, which efficiently filter out high-quality 3D points. Specifically, our methodology utilizes voxel-based point selection and voxel-aware point management to collectively optimize the selection and updating of map points on a per-voxel basis. These synergistic strategies enable the efficient retrieval of noise-resilient map points with the highest observation likelihood in current frames, thereby ensureing the state estimation accuracy. Extensive evaluations on three public benchmarks demonstrate that our Voxel-ESVIO outperforms state-of-the-art methods in both accuracy and computational efficiency.
Pro3D-Editor : A Progressive-Views Perspective for Consistent and Precise 3D Editing
Zheng, Yang, Huang, Mengqi, Chen, Nan, Mao, Zhendong
Text-guided 3D editing aims to precisely edit semantically relevant local 3D regions, which has significant potential for various practical applications ranging from 3D games to film production. Existing methods typically follow a view-indiscriminate paradigm: editing 2D views indiscriminately and projecting them back into 3D space. However, they overlook the different cross-view interdependencies, resulting in inconsistent multi-view editing. In this study, we argue that ideal consistent 3D editing can be achieved through a \textit{progressive-views paradigm}, which propagates editing semantics from the editing-salient view to other editing-sparse views. Specifically, we propose \textit{Pro3D-Editor}, a novel framework, which mainly includes Primary-view Sampler, Key-view Render, and Full-view Refiner. Primary-view Sampler dynamically samples and edits the most editing-salient view as the primary view. Key-view Render accurately propagates editing semantics from the primary view to other key views through its Mixture-of-View-Experts Low-Rank Adaption (MoVE-LoRA). Full-view Refiner edits and refines the 3D object based on the edited multi-views. Extensive experiments demonstrate that our method outperforms existing methods in editing accuracy and spatial consistency.
GSFeatLoc: Visual Localization Using Feature Correspondence on 3D Gaussian Splatting
GSFeatLoc: Visual Localization Using Feature Correspondence on 3D Gaussian Splatting Jongwon Lee 1 and Timothy Bretl 1 Abstract -- In this paper, we present a method for localizing a query image with respect to a precomputed 3D Gaussian Splatting (3DGS) scene representation. First, the method uses 3DGS to render a synthetic RGBD image at some initial pose estimate. Second, it establishes 2D-2D correspondences between the query image and this synthetic image. Third, it uses the depth map to lift the 2D-2D correspondences to 2D-3D correspondences and solves a perspective-n-point (PnP) problem to produce a final pose estimate. Results from evaluation across three existing datasets with 38 scenes and over 2,700 test images show that our method significantly reduces both inference time (by over two orders of magnitude, from more than 10 seconds to as fast as 0.1 seconds) and estimation error compared to baseline methods that use photometric loss minimization. Results also show that our method tolerates large errors in the initial pose estimate of up to 55 in rotation and 1.1 units in translation (normalized by scene scale), achieving final pose errors of less than 5 in rotation and 0.05 units in translation on 90% of images from the Synthetic NeRF and Mip-NeRF360 datasets and on 42% of images from the more challenging T anks and T emples dataset. I NTRODUCTION Visual localization is the process of determining the pose (position and orientation) of a query image with respect to a previously reconstructed scene (i.e., a map).
Revisiting invariances and introducing priors in Gromov-Wasserstein distances
Demetci, Pinar, Tran, Quang Huy, Redko, Ievgen, Singh, Ritambhara
Gromov-Wasserstein distance has found many applications in machine learning due to its ability to compare measures across metric spaces and its invariance to isometric transformations. However, in certain applications, this invariance property can be too flexible, thus undesirable. Moreover, the Gromov-Wasserstein distance solely considers pairwise sample similarities in input datasets, disregarding the raw feature representations. We propose a new optimal transport-based distance, called Augmented Gromov-Wasserstein, that allows for some control over the level of rigidity to transformations. It also incorporates feature alignments, enabling us to better leverage prior knowledge on the input data for improved performance. We present theoretical insights into the proposed metric. We then demonstrate its usefulness for single-cell multi-omic alignment tasks and a transfer learning scenario in machine learning.
Feature Correspondence: A Markov Chain Monte Carlo Approach
When trying to recover 3D structure from a set of images, the most difficult problem is establishing the correspondence between the measurements. Most existing approaches assume that features can be tracked across frames, whereas methods that exploit rigidity constraints to facilitate matching do so only under restricted cam(cid:173) era motion. In this paper we propose a Bayesian approach that avoids the brittleness associated with singling out one "best" cor(cid:173) respondence, and instead consider the distribution over all possible correspondences. We treat both a fully Bayesian approach that yields a posterior distribution, and a MAP approach that makes use of EM to maximize this posterior. We show how Markov chain Monte Carlo methods can be used to implement these techniques in practice, and present experimental results on real data.